Exploiting Similarities among Languages for Machine Translation
نویسندگان
چکیده
Dictionaries and phrase tables are the basis of modern statistical machine translation systems. This paper develops a method that can automate the process of generating and extending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data. It uses distributed representation of words and learns a linear mapping between vector spaces of languages. Despite its simplicity, our method is surprisingly effective: we can achieve almost 90% precision@5 for translation of words between English and Spanish. This method makes little assumption about the languages, so it can be used to extend and refine dictionaries and translation tables for any language pairs.
منابع مشابه
Exploiting Structural Similarities of Philippine Languages for A Multilingual Machine Translation System
PinoyMMT, a multilingual machine translation system, was designed for Tagalog, Cebuano and English. It exploits structural similarities of the Philippine languages Tagalog and Cebuano, and handles the free word order phenomena. It has two modules: the analyzer and synthesizer. Analyzer parses the input sentence and converts it to its feature structure representation based on the rules and lexic...
متن کاملFeature-based Decipherment for Large Vocabulary Machine Translation
Orthographic similarities across languages provide a strong signal for probabilistic decipherment, especially for closely related language pairs. The existing decipherment models, however, are not wellsuited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is comput...
متن کاملRule-based Machine Translation between Indonesian and Malaysian
We describe the development of a bidirectional rule-based machine translation system between Indonesian and Malaysian (id-ms), two closely related Austronesian languages natively spoken by approximately 35 million people. The system is based on the re-use of free and publicly available resources, such as the Apertium machine translation platform and Wikipedia articles. We also present our appro...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملRule Based Approach for Machine Translation System for Related Languages: Punjabi to Hindi
Machine Translation is one of the important area in natural language processing. Machine Translation is a great challenge for closely related language pair. Machine Translation system for more or fewer related languages is based upon the similarities such as syntactic and vocabulary. Punjabi and Hindi both are originated from the same parent language so both are closely related and having lot o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1309.4168 شماره
صفحات -
تاریخ انتشار 2013